A Bayesian View of the Poisson-Dirichlet Process
نویسندگان
چکیده
The two parameter Poisson-Dirichlet Process (PDP), a generalisation of the Dirichlet Process, is increasingly being used for probabilistic modelling in discrete areas such as language technology, bioinformatics, and image analysis. There is a rich literature about the PDP and its derivative distributions such as the Chinese Restaurant Process. This article reviews some of the basic theory and then the major results needed for Bayesian modelling of discrete problems including details of priors, posteriors and computation. The PDP is a generalisation of the Dirichlet distribution that allows one to build distributions over partitions, both finite and countably infinite. The PDP has two other remarkable properties: first it is partially conjugate to itself, which allows one to build hierarchies of PDPs, and second using a marginalised relative the Chinese Restaurant Process (CRP), one gets fragmentation and clustering properties that lets one layer partitions to build trees. This article presents the basic theory for understanding the notion of partitions and distributions over them, the PDP and the CRP, and the important properties of conjugacy, fragmentation and clustering, as well as some key related properties such as consistency and convergence. This article also presents a Bayesian interpretation of the Poisson-Dirichlet process: it is based on an improper and infinite dimensional Dirichlet distribution. This interpretation requires technicalities of priors, posteriors and Hilbert spaces, but conceptually, this means we can understand the process as just another Dirichlet and thus all its sampling properties emerge naturally. The theory of PDPs is usually presented for continuous distributions (more generally referred to as non-atomic distributions), however, when applied to discrete distributions its remarkable conjugacy property emerges. This context and basic results are also presented, as well as techniques for computing the second order Stirling numbers that occur in the posteriors for discrete distributions.
منابع مشابه
Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work
Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...
متن کاملBayesian change point estimation in Poisson-based control charts
Precise identification of the time when a process has changed enables process engineers to search for a potential special cause more effectively. In this paper, we develop change point estimation methods for a Poisson process in a Bayesian framework. We apply Bayesian hierarchical models to formulate the change point where there exists a step < /div> change, a linear trend and a known multip...
متن کاملLarge Sample Asymptotics for the Two Parameter Poisson Dirichlet Process
Abstract: This paper explores large sample properties of the two parameter (α, θ) Poisson-Dirichlet Process in two contexts. In a Bayesian context of estimating an unknown probability measure, viewing this process as a natural extension of the Dirichlet process, we explore the consistency and weak convergence of the the two parameter Poisson Dirichlet posterior process. We also establish the we...
متن کاملDistributions of Linear Functionals of Two Parameter Poisson – Dirichlet Random Measures
The present paper provides exact expressions for the probability distributions of linear functionals of the two-parameter Poisson– Dirichlet process PD(α, θ). We obtain distributional results yielding exact forms for density functions of these functionals. Moreover, several interesting integral identities are obtained by exploiting a correspondence between the mean of a Poisson–Dirichlet proces...
متن کاملA Bayesian Review of the Poisson-Dirichlet Process
The two parameter Poisson-Dirichlet process is also known as the PitmanYor Process and related to the Chinese Restaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article reviews the theory of the Poisson-Dirichlet process in terms of its consistency for estimation, the co...
متن کامل